Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 4 de 4
Filter
Add more filters










Database
Language
Publication year range
1.
Sci Data ; 11(1): 358, 2024 Apr 09.
Article in English | MEDLINE | ID: mdl-38594314

ABSTRACT

This paper presents a standardised dataset versioning framework for improved reusability, recognition and data version tracking, facilitating comparisons and informed decision-making for data usability and workflow integration. The framework adopts a software engineering-like data versioning nomenclature ("major.minor.patch") and incorporates data schema principles to promote reproducibility and collaboration. To quantify changes in statistical properties over time, the concept of data drift metrics (d) is introduced. Three metrics (dP, dE,PCA, and dE,AE) based on unsupervised Machine Learning techniques (Principal Component Analysis and Autoencoders) are evaluated for dataset creation, update, and deletion. The optimal choice is the dE,PCA metric, combining PCA models with splines. It exhibits efficient computational time, with values below 50 for new dataset batches and values consistent with seasonal or trend variations. Major updates (i.e., values of 100) occur when scaling transformations are applied to over 30% of variables while efficiently handling information loss, yielding values close to 0. This metric achieved a favourable trade-off between interpretability, robustness against information loss, and computation time.


Subject(s)
Datasets as Topic , Software , Principal Component Analysis , Reproducibility of Results , Workflow , Datasets as Topic/standards , Machine Learning
2.
Front Bioeng Biotechnol ; 11: 1104445, 2023.
Article in English | MEDLINE | ID: mdl-36741754

ABSTRACT

One of the most common sources of information in Synthetic Biology is the data coming from plate reader fluorescence measurements. These experiments provide a measure of the light emitted by a certain fluorescent molecule, such as the Green Fluorescent Protein (GFP). However, these measurements are generally expressed in arbitrary units and are affected by the measurement device gain. This limits the range of measurements in a single experiment and hampers the comparison of results among experiments. In this work, we describe PLATERO, a calibration protocol to express fluorescence measures in concentration units of a reference fluorophore. The protocol removes the gain effect of the measurement device on the acquired data. In addition, the fluorescence intensity values are transformed into units of concentration using a Fluorescein calibration model. Both steps are expressed in a single mathematical expression that returns normalized, gain-independent, and comparable data, even if the acquisition was done at different device gain levels. Most important, the PLATERO embeds a Linearity and Bias Analysis that provides an assessment of the uncertainty of the model estimations, and a Reproducibility and Repeatability analysis that evaluates the sources of variability originating from the measurements and the equipment. All the functions used to build the model, exploit it with new data, and perform the uncertainty and variability assessment are available in an open access repository.

3.
PLoS One ; 17(9): e0274171, 2022.
Article in English | MEDLINE | ID: mdl-36137106

ABSTRACT

The clinical course of COVID-19 is highly variable. It is therefore essential to predict as early and accurately as possible the severity level of the disease in a COVID-19 patient who is admitted to the hospital. This means identifying the contributing factors of mortality and developing an easy-to-use score that could enable a fast assessment of the mortality risk using only information recorded at the hospitalization. A large database of adult patients with a confirmed diagnosis of COVID-19 (n = 15,628; with 2,846 deceased) admitted to Spanish hospitals between December 2019 and July 2020 was analyzed. By means of multiple machine learning algorithms, we developed models that could accurately predict their mortality. We used the information about classifiers' performance metrics and about importance and coherence among the predictors to define a mortality score that can be easily calculated using a minimal number of mortality predictors and yielded accurate estimates of the patient severity status. The optimal predictive model encompassed five predictors (age, oxygen saturation, platelets, lactate dehydrogenase, and creatinine) and yielded a satisfactory classification of survived and deceased patients (area under the curve: 0.8454 with validation set). These five predictors were additionally used to define a mortality score for COVID-19 patients at their hospitalization. This score is not only easy to calculate but also to interpret since it ranges from zero to eight, along with a linear increase in the mortality risk from 0% to 80%. A simple risk score based on five commonly available clinical variables of adult COVID-19 patients admitted to hospital is able to accurately discriminate their mortality probability, and its interpretation is straightforward and useful.


Subject(s)
COVID-19 , Adult , COVID-19/diagnosis , Creatinine , Hospital Mortality , Hospitalization , Humans , Lactate Dehydrogenases , Machine Learning , Retrospective Studies , Risk Assessment
4.
Front Med (Lausanne) ; 9: 842991, 2022.
Article in English | MEDLINE | ID: mdl-35433768

ABSTRACT

Myalgic Encephalomyelitis/Chronic Fatigue Syndrome (ME/CFS), a chronic disease characterized by long-lasting persistent debilitating widespread fatigue and post-exertional malaise, remains diagnosed by clinical criteria. Our group and others have identified differentially expressed miRNA profiles in the blood of patients. However, their diagnostic power individually or in combinations seems limited. A Partial Least Squares-Discriminant Analysis (PLS-DA) model initially based on 817 variables: two demographic, 34 blood analytic, 136 PBMC miRNAs, 639 Extracellular Vesicle (EV) miRNAs, and six EV features, selected an optimal number of five components, and a subset of 32 regressors showing statistically significant discriminant power. The presence of four EV-features (size and z-values of EVs prepared with or without proteinase K treatment) among the 32 regressors, suggested that blood vesicles carry relevant disease information. To further explore the features of ME/CFS EVs, we subjected them to Raman micro-spectroscopic analysis, identifying carotenoid peaks as ME/CFS fingerprints, possibly due to erythrocyte deficiencies. Although PLS-DA analysis showed limited capacity of Raman fingerprints for diagnosis (AUC = 0.7067), Raman data served to refine the number of PBMC miRNAs from our previous model still ensuring a perfect classification of subjects (AUC=1). Further investigations to evaluate model performance in extended cohorts of patients, to identify the precise ME/CFS EV components detected by Raman and to reveal their functional significance in the disease are warranted.

SELECTION OF CITATIONS
SEARCH DETAIL
...